What is REGEX?
1) Short for regular expression, a regex is a string of text that lets you create patterns that help match, locate, and manage text.
2) Perl is a great example of a programming language that utilizes regular expressions.
3) Database management system (DBMS) utilises queries to pull out information from the database.
4) However, its only one of the many places you can find regular expressions. Regular expressions can also be used from the command line and in text editors to find text within a file.
5) Regular expressions are useful in search and replace operations.
6) The typical use case is to look for a sub-string that matches a pattern and replace it with something else. Most APIs using regular expressions allow you to reference capture groups from the search pattern in the replacement string.
How to write regular expression?
1) Repeaters : * , + and { }:
These symbols act as repeaters and tell the computer that the preceding character is to be used for more than just one time.
2) The asterisk symbol ( * ):
It tells the computer to match the preceding character (or set of characters) for 0 or more times (upto infinite).
3) The Plus symbol ( + ):
It tells the computer to repeat the preceding character (or set of characters) for atleast one or more times(upto infinite).
4) The curly braces {…}:
It tells the computer to repeat the preceding character (or set of characters) for as many times as the value inside this bracket.
5) Wildcard - ( . ):
The dot symbol can take place of any other symbol, that is why it is called the wildcard character.
6) Optional character - ( ? ):
This symbol tells the computer that the preceding character may or may not be present in the string to be matched.
7) The caret ( ^ ) symbol:
Setting position for match :tells the computer that the match must start at the beginning of the string or line.
8) The dollar ( $ ) symbol:
It tells the computer that the match must occur at the end of the string or before \n at the end of the line or string.
Components of REGEX :
1) A character class : matches any one of a set of characters. It is used to match the most basic element of a language like a letter, a digit, space, a symbol etc.
/s : matches any whitespace characters such as space and tab
/S : matches any non-whitespace characters
/d : matches any digit character
/D : matches any non-digit characters
/w : matches any word character (basically alpha-numeric)
/W : matches any non-word character
/b : matches any word boundary (this would include spaces, dashes, commas, semi-colons, etc)
The Escape Symbol : \
If you want to match for the actual '+', '.' etc characters, add a backslash( \ ) before that character. This will tell the computer to treat the following character as a search character and consider it for matching pattern.
Grouping Characters ( )
A set of different symbols of a regular expression can be grouped together to act as a single unit and behave as a block, for this, you need to wrap the regular expression in the parenthesis( ).
Vertical Bar ( | ) :
Matches any one element separated by the vertical bar (|) character.
\number :
Backreference: allows a previously matched sub-expression (expression captured or enclosed within circular brackets ) to be identified subsequently in the same regular expression. \n means that group enclosed within the n-th bracket will be repeated at current position.
Comment : (?# comment) -
Inline comment: The comment ends at the first closing parenthesis.
2) Groups
3) Quantifiers
4) Backreferences
5) Anchors, Boundaries, Delimiters
6) Lookarounds
7) Modifiers